Skip to content

Conversation

sadra-barikbin
Copy link
Collaborator

@sadra-barikbin sadra-barikbin commented Mar 29, 2023

Description:
Mean Average Precision metric for detection in COCO dataset.
Check list:

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • Documentation is updated (if required)

@github-actions github-actions bot added the module: metrics Metrics module label Mar 29, 2023
@AlexanderChaptykov
Copy link
Contributor

I reckon it'd be fantastic to get rid of these complex nested structures and break up large functions by spreading their functionality across smaller functions.
image

@sadra-barikbin
Copy link
Collaborator Author

Hi @AlexanderChaptykov , thanks for your comment. Yes that would be nice, but I need a short time to add a few commits beforehand.

@sadra-barikbin
Copy link
Collaborator Author

@vfdev-5 , finally It's ready to get reviewed.

i.e. ObjectDetectionMap and its dependencies
Removed allow_multiple...
Renamed average_operand
Renamed _measure_recall... to _compute_recall...
Docs has some nasty errors
Removed generic detection logics. Just that of the COCO is remained
Tests are updated
@sadra-barikbin
Copy link
Collaborator Author

Hi @vfdev-5 , finally this seems to be ready for review.

failure reasons:

  • MPS tests: Cummax has yet to be implemented for Pytorch MPS backend. User could set fallback MPS to CPU environment variable. Is it OK?
  • Unit tests / MacOS: It raises NotEnoughMemoryError while its MPS memory is empty. I could not figure out its reason. Any idea?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 4, 2024

@sadra-barikbin sounds great!

MPS tests: Cummax has yet to be implemented for Pytorch MPS backend

Let's do the following:

# precision_integrand = precision.flip(-1).cummax(dim=-1).values.flip(-1)


if precision.device.type == "mps":
    # Manual fallback to CPU if precision is on MPS due to the error:
    # NotImplementedError: The operator 'aten::_cummax_helper' is not currently implemented for the MPS device
    device = precision.device
    precision_integrand = precision.flip(-1).cpu()
    precision_integrand = precision_integrand.cummax(dim=-1).values
    precision_integrand = precision_integrand.to(device=device).flip(-1)
else:
    precision_integrand = precision.flip(-1).cummax(dim=-1).values.flip(-1)

Unit tests / MacOS: It raises NotEnoughMemoryError while its MPS memory is empty

I haven't seen that previously. If we have large tensors in the test, we can skip those tests on MPS device.

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments.
Thanks a lot for still working on this PR, Sadra!

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor updates and let's try to merge this great PR, @sadra-barikbin !
Thanks a lot for finally making this possible!

self,
iou_thresholds: Optional[Union[Sequence[float], torch.Tensor]] = None,
rec_thresholds: Optional[Union[Sequence[float], torch.Tensor]] = None,
num_classes: int = 91,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not this match MS Coco number of classes: 80 or 81

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. The model used in tests had 91 classes.

pytorch/vision#4613 (comment)

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 9, 2024

Concerning failing MPS tests:

2024-09-05T21:11:39.5671470Z FAILED tests/ignite/metrics/vision/test_object_detection_map.py::test_compute[sample0] - RuntimeError: MPS backend out of memory (MPS allocated: 8.00 MB, other allocations: 1.02 MB, max allowed: 7.93 GB). Tried to allocate 256 bytes on shared pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

Let's skip the tests on MPS

@sadra-barikbin
Copy link
Collaborator Author

The link https://docs.wandb.ai/library/init apparently isn't valid anymore.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 26, 2024

We can replace it with : https://docs.wandb.ai/ref/python/init

@sadra-barikbin
Copy link
Collaborator Author

@vfdev-5 , any blocker on this?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 8, 2024

@sadra-barikbin can you please fix meanwhile the issue with idist.all_gather_tensors_with_shapes on Horovod?
I'll make another quick review pass

@tarbaig
Copy link

tarbaig commented Feb 21, 2025

Any blocker on this? If something is missing I can volunteer to finish / fix it.

I am using this and would love to see it merged ❤️

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move forward with this PR
LGTM!
Thanks a lot @sadra-barikbin for this great addition !

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Feb 22, 2025

Any blocker on this? If something is missing I can volunteer to finish / fix it.

I am using this and would love to see it merged ❤️

@tarbaig Thanks for the feedback. I think we can merge this PR. It remains to fix the problems with idist.all_gather_tensors_with_shapes on Horovod. I have a PR improving the tests: #3302 but Horovod is still failing, so need to make a pass on that to understand why.

@vfdev-5 vfdev-5 merged commit b950b46 into pytorch:master Feb 26, 2025
15 of 17 checks passed
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 18, 2025

@sadra-barikbin can you please fix this issue with older pytorch versions (v1.10-v1.13.1):

2025-02-27T00:30:03.9316464Z         precision_integrand = precision_integrand.take_along_dim(
2025-02-27T00:30:03.9316653Z >           rec_thresh_indices.where(rec_thresh_indices != recall.size(-1), 0), dim=-1
2025-02-27T00:30:03.9316767Z         ).where(rec_thresh_indices != recall.size(-1), 0)
2025-02-27T00:30:03.9316988Z E       TypeError: where(): argument 'other' (position 2) must be Tensor, not int

@vfdev-5 vfdev-5 mentioned this pull request May 22, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs module: distributed Distributed module module: metrics Metrics module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants